Search CORE

194 research outputs found

Comparative transcriptome analysis of tree Eucalyptus species using RNAseq technology: analysis of genes interfering in wood quality aspects

Author: A Bateman
A Conesa
BE Suzek
D Grattapaglia
DC Goncalves
ELO Camargo
GAG Pereira
J Lepikson-Neto
L Wang
LC Nascimento
M Kanehisa
MM Salazar
PJSL Teixeira
R Li
RO Vidal
SF Alschul
WL Marques
X Huang
Publication venue: BioMed Central
Publication date: 01/09/2011
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Representative Proteomes: A Stable, Scalable and Unbiased Proteome Set for Sequence Analysis and Functional Annotation

Author: AN Nikolskaya
BE Suzek
BJ Tindall
Cathy H. Wu
Chuming Chen
Darren A. Natale
EW Sayers
H Huang
Hongzhan Huang
JF Imhoff
Jian Zhang
Jörg D. Hoheisel
P Escobar-Paramo
P Flicek
R Leinonen
R Mazumder
Raja Mazumder
Robert D. Finn
S Hunter
SJ Sammut
T Gabaldon
Publication venue: Public Library of Science
Publication date: 01/04/2011
Field of study

The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs), each selected from a Representative Proteome Group (RPG) containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT) are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55) most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains) and annotation information (93% of experimentally characterized proteins). All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization

Crossref

Directory of Open Access Journals

PubMed Central

iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data

Author: A Siepel
BE Suzek
CCLC Chang
E Byvatov
Hao Sun
Huating Wang
J Liu
Kun Sun
M Clamp
ME Dinger
MF Lin
P Carninci
P Kapranov
P Kapranov
Peiyong Jiang
RT Arrial
SF Altschul
TR Mercer
Xiaofeng Song
Xiaona Chen
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Species-level functional profiling of metagenomes and metatranscriptomes.

Author: A Sczyrba
A Shafquat
AE Duran-Pinedo
AK Sharma
B Buchfink
B Langmead
BE Suzek
BK Swan
C Burke
C Luo
Curtis Huttenhower
D Medini
DH Huson
DT Truong
DT Truong
E Pasolli
EA Franzosa
EA Franzosa
Eric A. Franzosa
George Weingart
GG Silva
Gholamali Rahnavard
H Hauswedell
J Kim
J Lloyd-Price
J Lloyd-Price
J Ravel
J. Gregory Caporaso
JA Fuhrman
K Huang
Karen Schwarzberg Lipson
Lauren J. McIver
LR Thompson
LR Thompson
Luke R. Thompson
M Hamady
M Kanehisa
M Scholz
Melanie Schirmer
MY Galperin
N Segata
N Segata
Nicola Segata
OU Mason
P Petrenko
PJ Turnbaugh
R Caspi
RC Edgar
RD Finn
Rob Knight
S Abubucker
S Nayfach
S Sunagawa
S Sunagawa
T Bose
UniProt Consortium.
W Huang
Y Ye
Y Zhao
Publication venue: eScholarship, University of California
Publication date: 01/11/2018
Field of study

Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types

Crossref

eScholarship - University of California

An integrated database of Eucalyptus spp. genome project

Author: A Bateman
BE Suzek
C Baudet
C Trapnell
Danieli Cristina Gonçalves
E Mizrachi
Eduardo Leal Oliveira Camargo
Gonçalo Amarante Guimarães Pereira
Jorge Lepikson Neto
L Wang
LB Koski
Leandro Costa Nascimento
M Ashburner
M Kanehisa
Marcela Mendes Salaza
Marcelo Falsarella Carazzolle
R Li
Ramon Oliveira Vidal
S Audic
SF Altschul
Wesley Leoricy Marques
X Huang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

FastBLAST: Homology Relationships for Millions of Proteins

Author: A Marchler-Bauer
AA Schaffer
Adam P. Arkin
BE Suzek
Cecile Fairhead
CH Wu
CM Zmasek
D Wilson
F Pearl
H Mi
I Letunic
JD Selengut
LB Koski
M Remm
MN Price
Morgan N. Price
NJ Mulder
Paramvir S. Dehal
PS Dehal
R Durbin
RD Finn
RL Tatusov
S Yooseph
SF Altschul
W Gish
W Li
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding.Methodology/principal findingsWe present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database ("NR"), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query.Conclusions/significanceFastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Novel insights into the insect trancriptome response to a natural DNA virus

Author: BE Suzek
G Parra
Gaganjot Kaur
GW Blissard
JA Hoffman
Jennie S Garbutt
M Begon
M Boots
MA Larkin
Mike Boots
Seanna J McTaggart
Stephen Bridgett
TA Hall
Tidbury Hannah
TW Phillips
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

ArticleCopyright © 2015 McTaggart et al.; licensee BioMed Central.Background Little is known about invertebrate responses to DNA viruses. Here, we infect a commercially important pest moth species Plodia interpunctella with its naturally infecting DNA virus. We sequenced, assembled and annotated the complete transcriptome of the moth, and a partial transcriptome of the virus. We then tested for differential gene expression between moths that were exposed to the virus and controls. Results We found 51 genes that were differentially expressed in moths exposed to a DNA baculovirus compared to controls. Gene set enrichment analysis revealed that cuticle proteins were significantly overrepresented in this group of genes. Interestingly, 6 of the 7 differentially expressed cuticle proteins were downregulated, suggesting that baculoviruses are able to manipulate its host’s response. In fact, an additional 29 of the 51 genes were also downregulated in exposed compared with control animals, including a gram-negative binding protein. In contrast, genes involved in transposable element movement were upregulated after infection. Conclusions We present the first experiment to measure genome-wide gene expression in an insect after infection with a natural DNA virus. Our results indicate that cuticle proteins might be key genes underpinning the response to DNA viruses. Furthermore, the large proportion of genes that were downregulated after viral exposure suggests that this virus is actively manipulating the insect immune response. Finally, it appears that transposable element activity might increase during viral invasion. Combined, these results provide much needed host candidate genes that respond to DNA viral invaders.NERC Biomolecular Analysis Facility (NBAF

Queen's University Belfast Research Portal

Crossref

Springer - Publisher Connector

PubMed Central

Edinburgh Research Explorer

Open Research Exeter

A web-based bioinformatics interface applied to the GENOSOJA project: databases and pipelines

Author: Altschul SF
Audic S
Bateman A
Baudet C
Carazzolle MF
Cheng KCK
Dowell RD
Eliseu Binneck
Gonçalo Amarante Guimarães Pereira
Gustavo Gilson Lacerda Costa
Huang X
Jenkinson AM
Kanehisa M
Koski LB
Kulcheski FR
Leandro Costa do Nascimento
Li R
Marcelo Falsarella Carazzolle
Molina L
Rodrigues FA
Schmutz J
Smith TF
Soares-Cavalcanti NM
Suzek BE
Umezawa T
Wanderley-Nogueira AC
Wang L
Yorinori JT
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2012
Field of study

Crossref

TMFoldRec: a statistical potential-based transmembrane protein fold recognition tool.

Author: A Fiser
A Lobley
A Oberai
A Ray
AA Canutescu
AJ Heim
AL Hopkins
AS Amin
BE Suzek
BE Weiner
D Fischer
DP Ng
Dániel Kozma
EJ Tarling
F Morcos
F Palmieri
G Studer
GE Tusnády
GE Tusnády
GE Tusnády
GE Tusnády
Gábor E. Tusnády
H Wang
I Sillitoe
J Ma
J Ma
J Peng
J Stefková
J Söding
K Arnold
M Punta
M Remmert
MI Sadowski
MR Dorwart
P Barth
P Bradley
PA Insel
PD Thomas
RD Finn
RD Finn
S Kalyaanamoorthy
S Rust
SF Altschul
SR Eddy
T Nugent
T Schöneberg
V Yarov-Yarovoy
Y Zhang
Z Dosztányi
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: Transmembrane proteins (TMPs) are the key components of signal transduction, cell-cell adhesion and energy and material transport into and out from the cells. For the deep understanding of these processes, structure determination of transmembrane proteins is indispensable. However, due to technical difficulties, only a few transmembrane protein structures have been determined experimentally. Large-scale genomic sequencing provides increasing amounts of sequence information on the proteins and whole proteomes of living organisms resulting in the challenge of bioinformatics; how the structural information should be gained from a sequence. RESULTS: Here, we present a novel method, TMFoldRec, for fold prediction of membrane segments in transmembrane proteins. TMFoldRec based on statistical potentials was tested on a benchmark set containing 124 TMP chains from the PDBTM database. Using a 10-fold jackknife method, the native folds were correctly identified in 77 % of the cases. This accuracy overcomes the state-of-the-art methods. In addition, a key feature of TMFoldRec algorithm is the ability to estimate the reliability of the prediction and to decide with an accuracy of 70 %, whether the obtained, lowest energy structure is the native one. CONCLUSION: These results imply that the membrane embedded parts of TMPs dictate the TM structures rather than the soluble parts. Moreover, predictions with reliability scores make in this way our algorithm applicable for proteome-wide analyses. AVAILABILITY: The program is available upon request for academic use

Crossref

Springer - Publisher Connector

PubMed Central

Repository of the Academy's Library

Ribosome binding site recognition using neural networks

Author: Alberts B
Ana Tereza Ribeiro Vasconcelos
Braga AP
Carvalho LAV
Collado-Vides J
Daniele Quintella Mendes
Haykin S
Lewin B
Lodish HF
Luciana Itida Ferrari
Mitchell TM
Márcio Ferreira da Silva Oliveira
Salvini RL
Schneider TD
Shultzberger RK
Suzek BE
Wu CH
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2004
Field of study

Crossref